6,908 research outputs found
Learning First-Order Definitions of Functions
First-order learning involves finding a clause-form definition of a relation
from examples of the relation and relevant background information. In this
paper, a particular first-order learning system is modified to customize it for
finding definitions of functional relations. This restriction leads to faster
learning times and, in some cases, to definitions that have higher predictive
accuracy. Other first-order learning systems might benefit from similar
specialization.Comment: See http://www.jair.org/ for any accompanying file
Unitary groups over local rings
Structural properties of unitary groups over local, not necessarily
commutative, rings are developed, with applications to the computation of the
orders of these groups (when finite) and to the degrees of the irreducible
constituents of the Weil representation of a unitary group associated to a
ramified extension of finite local rings
Improved Use of Continuous Attributes in C4.5
A reported weakness of C4.5 in domains with continuous attributes is
addressed by modifying the formation and evaluation of tests on continuous
attributes. An MDL-inspired penalty is applied to such tests, eliminating some
of them from consideration and altering the relative desirability of all tests.
Empirical trials show that the modifications lead to smaller decision trees
with higher predictive accuracies. Results also confirm that a new version of
C4.5 incorporating these changes is superior to recent approaches that use
global discretization and that construct small trees with multi-interval
splits.Comment: See http://www.jair.org/ for any accompanying file
Learning a Static Analyzer from Data
To be practically useful, modern static analyzers must precisely model the
effect of both, statements in the programming language as well as frameworks
used by the program under analysis. While important, manually addressing these
challenges is difficult for at least two reasons: (i) the effects on the
overall analysis can be non-trivial, and (ii) as the size and complexity of
modern libraries increase, so is the number of cases the analysis must handle.
In this paper we present a new, automated approach for creating static
analyzers: instead of manually providing the various inference rules of the
analyzer, the key idea is to learn these rules from a dataset of programs. Our
method consists of two ingredients: (i) a synthesis algorithm capable of
learning a candidate analyzer from a given dataset, and (ii) a counter-example
guided learning procedure which generates new programs beyond those in the
initial dataset, critical for discovering corner cases and ensuring the learned
analysis generalizes to unseen programs.
We implemented and instantiated our approach to the task of learning
JavaScript static analysis rules for a subset of points-to analysis and for
allocation sites analysis. These are challenging yet important problems that
have received significant research attention. We show that our approach is
effective: our system automatically discovered practical and useful inference
rules for many cases that are tricky to manually identify and are missed by
state-of-the-art, manually tuned analyzers
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Data competitions rely on real-time leaderboards to rank competitor entries
and stimulate algorithm improvement. While such competitions have become quite
popular and prevalent, particularly in supervised learning formats, their
implementations by the host are highly variable. Without careful planning, a
supervised learning competition is vulnerable to overfitting, where the winning
solutions are so closely tuned to the particular set of provided data that they
cannot generalize to the underlying problem of interest to the host. This paper
outlines some important considerations for strategically designing relevant and
informative data sets to maximize the learning outcome from hosting a
competition based on our experience. It also describes a post-competition
analysis that enables robust and efficient assessment of the strengths and
weaknesses of solutions from different competitors, as well as greater
understanding of the regions of the input space that are well-solved. The
post-competition analysis, which complements the leaderboard, uses exploratory
data analysis and generalized linear models (GLMs). The GLMs not only expand
the range of results we can explore, they also provide more detailed analysis
of individual sub-questions including similarities and differences between
algorithms across different types of scenarios, universally easy or hard
regions of the input space, and different learning objectives. When coupled
with a strategically planned data generation approach, the methods provide
richer and more informative summaries to enhance the interpretation of results
beyond just the rankings on the leaderboard. The methods are illustrated with a
recently completed competition to evaluate algorithms capable of detecting,
identifying, and locating radioactive materials in an urban environment.Comment: 36 page
Hydrology and Water Quality in the Central Kentucky Karst: Phase II Part A: Preliminary Summary of the Hydrogeology of the Mill Hole Sub-Basin of the Turnhole Spring Groundwater Basin
Water from upland areas flows to small ephemeral and perennial springs that feed sinking streams that are tributary to low-order cave streams. These cave streams, also recharged by diffuse percolation, are part of a dendritic network in which intermediate-order streams join high-order streams that flow to major trunk streams. The trunk in the Mill Hole Sub-basin flows across the bottom of a large karst window, Mill Hole, and joins the trunk of the Patoka Creek Sub-basin. Their combined discharge bifurcates, flows around the collapsed central core of a larger karst window, Cedar Sink, and re-joins to flow as one to Turnhole Spring, along the south bank of Green River. The location of the major trunk streams can be inferred from the position and orientation of well-defined troughs in the piezometric surface. Flow velocities over the same 5-mile distance, erroneously assuming a straight path from Parker Cave to Mill Hole, range from 60 to 1100 ft per hour--depending on whether discharge is at flood or base flow conditions. Actual velocity extremes are probably lower and higher
- …